Expenditure on cloud computing services reached a mammoth 225 billion dollars in 2022.

Companies start their cloud-native journeys with the best intentions and consume the many benefits including:

  • Adaptability and scalability
  • Automation and flexibility
  • Quicker time to value
  • Enhanced customer experiences

Enterprise spending on cloud and data centers by segment from 2009 to 2022

But current cloud expenditure growth levels are unsustainable for many organizations and with 82% of organizations investing in FinOps staff it shows that cloud expenditure is top of mind in the c-suite. Organizations are starting to implement FinOps frameworks attempting to get maximum business value by helping engineering, technology, finance, and business teams to collaborate on data-driven spending decisions.

Unsurprisingly with rapidly rising cloud expenditure cloud resource wastage specifically Kubernetes resource wastage is also rising.

Why are cloud costs rising?

Recent research by Civo has identified several root causes for spiraling cloud costs, including:

  1. Overly complex and opaque pricing (primarily by hyperscalers).
  2. Unwieldy and complex setups that are not fully understood by end users which can cause applications to run up huge costs (especially data transfer, compute and storage).
  3. Inability to be easily able to predict future costs, meaning unexpected and unreasonable bills.

Results from the Cost of Cloud report 2022

Drilling down the rising costs are due to the amount of resources “requested” by the apps that get deployed onto Kubernetes and Kubernetes does not distinguish between resources requested and resources used.

While Kubernetes implements autoscaling techniques that are meant to minimize resources, real world results do not support the notion that Kubernetes is cost-effective out of the box.

This is because at the smallest unit of measure, the pod, human beings still impose simplistic models for resource allocation or worse, they continue legacy sizing practices from their experience with monolithic virtual machine-based platforms.

Solving this growing problem isn’t easy

Why not? Due to a few reasons such as:

  1. Typically developers aren't incentivized to size right so will over-provision to ensure a seamless customer experience (why wouldn’t they?)
  2. SREs and platform engineers that know Kubernetes often don’t know enough about the applications to right-size them
  3. Complexity across the autoscaling techniques in Kubernetes means it’s hard to understand how the various knobs and levers relate to one another, like vertical versus horizontal scaling
  4. Right-sizing one workload at a time is incredibly inefficient and gets worse and worse as you scale up until its unmanageable

If you are a Kubernetes user and these challenges sound all too familiar, read on and I will discuss how StormForge, an automated Kubernetes resource management platform, can help stop this storm brewing into a cloud cost catastrophe.

Let’s take a look into 3 key areas needed to address the problem: visibility, intelligence, and automation & optimization.

Visibility

The enablement of visibility nearly always involves using a third-party solution since cost visibility tools from cloud providers don’t display cost data in real time.

With Kubernetes having visibility into the usage, current requests and current limits for CPU and memory is key to be able to make insightful recommendations on where potential savings could be made per workload.

Comprehensive visibility tools will then show a complete overview of current total requests and what total requests would look like with optimization applied. Like in StormForge’s Optimize Live you find an overview of the top clusters, top namespaces, top overprovisioned workloads and top underprovisioned workloads.

Visibility cloud costs

Intelligence

Once you have the visibility into the usage data you need to look into the current requests and limits and then look at setting these to more appropriate levels based on usage.

Depending on the criticality of the workload you can take a more aggressive approach to achieve higher savings or a more conservative approach to achieve better reliability for business critical applications via guard rails.

Applied intelligence will enable actionable recommendations to optimize resources as usage varies.

It is important for applied intelligence to understand application behavior. This is most effective with machine learning producing frequent recommendations based on right-sizing requirements (hourly, daily or weekly). When you are into thousands of workloads having machine learning in place to right-size pods is a competitive advantage.

Intelligence cloud costs

Above is an example of recommendation details shown in the StormForge Optimize Live UI per container. For CPU and Memory we show usage, current requests, current limits, recommended requests and recommended limits.

Automation & Optimization

To proactively and continuously right-size pods, improve efficiency and eliminate cloud waste automation is key.

Having the ability to export recommendations into your CI/CD pipeline as native Kubernetes YAML objects is how many of our customers apply recommendations, some do it manually and many trust the applier to automatically apply recommendations on a schedule to ensure their workloads are always right-sized.

Automation & Optimization cloud costs

Above is an example of recommendation configuration options for workloads and how to enable automatic deployment.

Summary

Companies continue to accelerate their cloud journeys.

Kubernetes is an outstanding innovation, but it won’t make applications more cost-efficient on its own and with a lack of visibility, intelligence and automation it can result in a significant cost being spent on wasted resources.

To reduce cloud waste building cost consciousness into the culture of your company is key and you need to have visibility into your Kubernetes workloads to drive decisions on how cloud expenditure can be controlled.

StormForge Optimize Live helps our customers realize tremendous savings in both their cloud expenditure and operationally, enabling on average 50% of cluster resources to be saved via right-sizing all workloads on your cluster.

Thanks for reading this blog post. Visit our website to learn more or reach out to me directly.